<!DOCTYPE html>
<html lang="en"><head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="shortcut icon" href="https://newsblur.com/media/img/favicon.ico" type="image/png" />
  <link rel="icon" href="https://newsblur.com/media/img/favicon_32.png" sizes="32x32"/>
  <link rel="icon" href="https://newsblur.com/media/img/favicon_64.png" sizes="64x64"/>
  <link rel="alternate" type="application/rss+xml" 
  title="The NewsBlur Blog RSS feed" 
  href="/feed.xml" /><!-- Begin Jekyll SEO tag v2.8.0 -->
<title>Improved Text view story extraction | The NewsBlur Blog</title>
<meta name="generator" content="Jekyll v4.3.4" />
<meta property="og:title" content="Improved Text view story extraction" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="The Text view is one of the most popular NewsBlur features. It’s available on all three platforms and gives you the full text of the original story, even in truncated RSS feeds. Up until today, NewsBlur’s implementation of the Text view used Readability’s open source text extractor." />
<meta property="og:description" content="The Text view is one of the most popular NewsBlur features. It’s available on all three platforms and gives you the full text of the original story, even in truncated RSS feeds. Up until today, NewsBlur’s implementation of the Text view used Readability’s open source text extractor." />
<link rel="canonical" href="https://blog.newsblur.com/2017/10/24/improved-text-view-story-extraction/" />
<meta property="og:url" content="https://blog.newsblur.com/2017/10/24/improved-text-view-story-extraction/" />
<meta property="og:site_name" content="The NewsBlur Blog" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2017-10-24T16:17:25-04:00" />
<meta name="twitter:card" content="summary" />
<meta property="twitter:title" content="Improved Text view story extraction" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2017-10-24T16:17:25-04:00","datePublished":"2017-10-24T16:17:25-04:00","description":"The Text view is one of the most popular NewsBlur features. It’s available on all three platforms and gives you the full text of the original story, even in truncated RSS feeds. Up until today, NewsBlur’s implementation of the Text view used Readability’s open source text extractor.","headline":"Improved Text view story extraction","mainEntityOfPage":{"@type":"WebPage","@id":"https://blog.newsblur.com/2017/10/24/improved-text-view-story-extraction/"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://blog.newsblur.com/assets/newsblur_logo_512.png"}},"url":"https://blog.newsblur.com/2017/10/24/improved-text-view-story-extraction/"}</script>
<!-- End Jekyll SEO tag -->
<link rel="stylesheet" href="/assets/main.css">
  <link rel="stylesheet" type="text/css" href="https://cloud.typography.com/6565292/711824/css/fonts.css" />
   <link rel="stylesheet" type="text/css" href="https://cloud.typography.com/6565292/731824/css/fonts.css" /><link type="application/atom+xml" rel="alternate" href="https://blog.newsblur.com/feed.xml" title="The NewsBlur Blog" /></head>
<body><header class="site-header" role="banner">

  <div class="wrapper"><a class="site-title" rel="author" href="/">
      <div class="site-title-image">
        <img src="/assets/newsblur_logo_512.png">
      </div>
      <div class="site-title-text">The NewsBlur Blog</div>
    </a><nav class="site-nav">
        <input type="checkbox" id="nav-trigger" class="nav-trigger" />
        <label for="nav-trigger">
          <span class="menu-icon">
            <svg viewBox="0 0 18 15" width="18px" height="15px">
              <path d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.032C17.335,0,18,0.665,18,1.484L18,1.484z M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.032C17.335,6.031,18,6.696,18,7.516L18,7.516z M18,13.516C18,14.335,17.335,15,16.516,15H1.484 C0.665,15,0,14.335,0,13.516l0,0c0-0.82,0.665-1.483,1.484-1.483h15.032C17.335,12.031,18,12.695,18,13.516L18,13.516z"/>
            </svg>
          </span>
        </label>

        <div class="trigger"><a class="page-link" href="https://www.newsblur.com">Visit NewsBlur ➤</a></div>
      </nav></div>
</header>

<header class="site-subheader" role="banner">

  <div class="wrapper">
    <div class="top">
      NewsBlur is a personal news reader that brings people together to talk about the world.
    </div>
    <div class="bottom">
      A new sound of an old instrument.
    </div>
  </div>

</header>
<main class="page-content" aria-label="Content">
      <div class="wrapper">
        <article class="post h-entry" itemscope itemtype="http://schema.org/BlogPosting">

  <header class="post-header">
    <h1 class="post-title p-name" itemprop="name headline">Improved Text view story extraction</h1>
    <p class="post-meta">
      <time class="dt-published" datetime="2017-10-24T16:17:25-04:00" itemprop="datePublished">Oct 24, 2017
      </time></p>
  </header>

  <div class="post-content e-content" itemprop="articleBody">
    <p>The Text view is one of the most popular NewsBlur features. It’s available on all three platforms and gives you the full text of the original story, even in truncated RSS feeds. Up until today, NewsBlur’s implementation of the Text view used Readability’s open source text extractor.</p>

<p>Starting today, all stories will be run through <a href="https://mercury.postlight.com/web-parser/">Postlight Labs’ Mercury Parser</a>. That means that not only will the full text be more likely to correctly pull the entire article, but it will also do a much better job with extracting full size images in stories.</p>

<p>Take a look:</p>

<p><img src="https://s3.amazonaws.com/static.newsblur.com/blog/text_view_images.png" alt="" /></p>

<p>A welcome improvement. This new text extractor and parser also does a better job of handling Unicode and Chinese characters. And when it doesn’t extract text as well as the old text extractor, NewsBlur will automatically fallback on the old method.</p>


  </div><a class="u-url" href="/2017/10/24/improved-text-view-story-extraction/" hidden></a>
</article>

      </div>
    </main><footer class="site-footer h-card">
  <data class="u-url" href="/"></data>

  <div class="wrapper">

    <h2 class="footer-heading">The NewsBlur Blog</h2>

    <div class="footer-col-wrapper">
      

      <div class="footer-col footer-col-1"><ul class="social-media-list"><li><a href="https://github.com/samuelclay"><svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#github"></use></svg> <span class="username">samuelclay</span></a></li><li><a href="https://www.twitter.com/newsblur"><svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#twitter"></use></svg> <span class="username">newsblur</span></a></li><li><a href="mailto:blog@newsblur.com?subject=Hello from the NewsBlur blog"><svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#email"></use></svg> <span class="username">blog@newsblur.com</span></a></li></ul>
</div>

      <div class="footer-col footer-col-3">
        <p>NewsBlur is a personal news reader that brings people together to talk about the world.<br />
A new sound of an old instrument.<br />
</p>
      </div>
    </div>

  </div>

</footer>
</body>

</html>
